{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Running Llama 3 on Mac, Windows or Linux\n",
    "This notebook goes over how you can set up and run Llama 3 locally on a Mac, Windows or Linux using [Ollama](https://ollama.com/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Steps at a glance:\n",
    "1. Download and install Ollama.\n",
    "2. Download and test run Llama 3.\n",
    "3. Use local Llama 3 via Python.\n",
    "4. Use local Llama 3 via LangChain.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1. Download and install Ollama\n",
    "\n",
    "On Mac or Windows, go to the Ollama download page [here](https://ollama.com/download) and select your platform to download it, then double click the downloaded file to install Ollama.\n",
    "\n",
    "On Linux, you can simply run on a terminal `curl -fsSL https://ollama.com/install.sh | sh` to download and install Ollama."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2. Download and test run Llama 3\n",
    "\n",
    "On a terminal or console, run `ollama pull llama3` to download the Llama 3 8b chat model, in the 4-bit quantized format with size about 4.7 GB.\n",
    "\n",
    "Run `ollama pull llama3:70b` to download the Llama 3 70b chat model, also in the 4-bit quantized format with size 39GB.\n",
    "\n",
    "Then you can run `ollama run llama3` and ask Llama 3 questions such as \"who wrote the book godfather?\" or \"who wrote the book godfather? answer in one sentence.\" You can also try `ollama run llama3:70b`, but the inference speed will most likely be too slow - for example, on an Apple M1 Pro with 32GB RAM, it takes over 10 seconds to generate one token using Llama 3 70b chat (vs over 10 tokens per second with Llama 3 8b chat).\n",
    "\n",
    "You can also run the following command to test Llama 3 8b chat:\n",
    "```\n",
    " curl http://localhost:11434/api/chat -d '{\n",
    "  \"model\": \"llama3\",\n",
    "  \"messages\": [\n",
    "    {\n",
    "      \"role\": \"user\",\n",
    "      \"content\": \"who wrote the book godfather?\"\n",
    "    }\n",
    "  ],\n",
    "  \"stream\": false\n",
    "}'\n",
    "```\n",
    "\n",
    "The complete Ollama API doc is [here](https://github.com/ollama/ollama/blob/main/docs/api.md)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3. Use local Llama 3 via Python\n",
    "\n",
    "The Python code below is the port of the curl command above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "import json\n",
    "\n",
    "url = \"http://localhost:11434/api/chat\"\n",
    "\n",
    "def llama3(prompt):\n",
    "    data = {\n",
    "        \"model\": \"llama3\",\n",
    "        \"messages\": [\n",
    "            {\n",
    "              \"role\": \"user\",\n",
    "              \"content\": prompt\n",
    "            }\n",
    "        ],\n",
    "        \"stream\": False\n",
    "    }\n",
    "    \n",
    "    headers = {\n",
    "        'Content-Type': 'application/json'\n",
    "    }\n",
    "    \n",
    "    response = requests.post(url, headers=headers, json=data)\n",
    "    \n",
    "    return(response.json()['message']['content'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "response = llama3(\"who wrote the book godfather\")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4. Use local Llama 3 via LangChain\n",
    "\n",
    "Code below use LangChain with Ollama to query Llama 3 running locally. For a more advanced example of using local Llama 3 with LangChain and agent-powered RAG, see [this](https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_rag_agent_llama3_local.ipynb)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install langchain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.chat_models import ChatOllama\n",
    "\n",
    "llm = ChatOllama(model=\"llama3\", temperature=0)\n",
    "response = llm.invoke(\"who wrote the book godfather?\")\n",
    "print(response.content)\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
